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Abstract: 

This paper reports the result of a search for the standard model Higgs boson in events 
containing four reconstructed jets associated with quarks. For masses below 135GeV/c 2 , 
the Higgs boson decays to bottom-antibottom quark pairs are dominant and result pri- 
marily in two hadronic jets. An additional two jets can be produced in the hadronic decay 
of a W or Z boson produced in association with the Higgs boson, or from the incoming 
quarks that produced the Higgs boson through the vector-boson fusion process. The search 
is performed using a sample of yfs = 1.96 TeV proton-antiproton collisions corresponding 
to an integrated luminosity of 9.45 fb _1 recorded by the CDF II detector. The data are 
in agreement with the background model and 95% credibility level upper limits on Higgs 
boson production are set as a function of the Higgs boson mass. The median expected 
(observed) limit for a 125GeV/c 2 Higgs boson is 11.0 (9.0) times the predicted standard 
model rate. 

Keywords: Higgs, All-Hadronic, 6-jets 
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1 Introduction 

The Higgs boson is the physical manifestation of the hypothesized mechanism that provides 
mass to fundamental particles in the standard model (SM) [1-3]. Direct searches at the 
LEP collider [4], the Tevatron [5], and the LHC [6, 7] have excluded SM Higgs boson 
masses at the 95% confidence level or 95% credibility level (CL), except within the range 
122-128 GeV/c 2 . The most sensitive searches at the LHC are based on SM Higgs boson 
decays to pairs of gauge bosons. At the Tevatron, searches based on Higgs boson decay 
to bottom-antibottom quark pairs (66) are the most sensitive within the allowed range. 
Searches in this channel offers complementary information on fermion Yukawa couplings 
to the Higgs boson. 

Recently, the ATLAS and CMS collaborations have reported the observation of a Higgs- 
like particle at a mass of ~ 125GeV/c 2 [6, 7], and the Tevatron has reported evidence for 
a particle decaying to 66 produced in association with a W/Z boson for masses within the 
range 120 - 135GeV/c 2 [8]. 
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This paper describes a search for the Higgs boson using a data sample corresponding 
to an integrated luminosity of 9.45 fb _1 of pp collisions at sfs = 1.96 TeV recorded by 
the Collider Detector at Fermilab (CDF II). In this search two production mechanisms are 
studied: associated vector-boson production (VH) and vector-boson fusion (VBF). The VH 
channel denotes the process pp —> W/Z + H — > qq' + bb. The VBF channel identifies the 
process pp — > qq'H — > qq'bb, where the two incoming quarks each radiate a weak boson, 
which subsequently fuse into a Higgs boson. In both channels, the Higgs boson decays to 
bb, and is produced in association with two other quarks (qq'). Data are tested against the 
hypothesis of the presence of Higgs boson with mass in the range 100 < tjih < 150 GeV/c 2 . 
The H — >• bb mode is the dominant decay for ran < 135GeV/c 2 [9]. 

Searches for a Higgs boson performed in final states containing leptons, jets, and 
missing energy have the advantage of smaller background, but the Higgs boson signal yields 
are also very small. The all-hadronic search channel, described here, has larger potential 
signal contributions but suffers from substantial QCD multi-jet background contributions. 
The challenge of this channel is to accurately model and reduce the multi-jet background. 
Two previous papers were published on searches for a Higgs boson in the all-hadronic 
channel at CDF using data sets of 2 fb" 1 [10] and 4 fb _1 [11]. Another paper was published 
on searches for a Higgs boson in the all-hadronic channel at CDF using data collected 
during Run I [12]. The LEP collider also conducted searches for the Higgs boson in the 
all-hadronic final state in the e + e~ — > ZH — )■ qq + bb channel [4]. 

2 The CDF II detector 

The CDF II detector is an azimuthally and forward-backward symmetric multipurpose 
detector. CDF II uses a cylindrical coordinate system with the z-axis aligned along the 
proton beam direction, where 9 is the polar angle relative to the z-axis and <j) is the 
azimuthal angle relative to the x-axis. The pseudorapidity is defined as r/ = — ln(tan#/2) 
and the transverse energy is calculated as = Es'm9. 

The CDF II detector consists of a pair of concentric charged-particle tracking detec- 
tors immersed in a 1.4 T solenoid magnetic field, surrounded by calorimeters and muon 
detectors. The inner tracking detector is the silicon vertex detector that is located im- 
mediately outside the beam pipe, provides precise three-dimensional reconstruction of 
charged-particle trajectories (tracks) and is used to identify displaced vertices associated 
with bottom-quark and charm-quark hadron decays. The momenta of charged particles is 
measured precisely in the central outer tracker (COT), a cylindrical multiwire drift cham- 
ber. The tracking detectors cover the pseudorapidity range \r]\ < 1.1. Outside the COT are 
electromagnetic and hadronic calorimeters arranged in a projective-tower geometry, cov- 
ering the region |r/| < 3.5, to provide energy measurements for both charged and neutral 
particles. Drift chambers and scintillator counters in the region |r/| < 1.5 provide muon 
identification outside the calorimeters. Luminosity is measured using low- mass gaseous 
Cherenkov luminosity counters (CLC). There are two CLC modules in the CDF II detec- 
tor installed at small angles in the proton and antiproton directions, arranged in three 
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concentric layers around the beam pipe. More details about the CDF II detector can be 
found in refs. [13-15]. 

Jets are defined by a cluster of energy deposited in the calorimeter using a jet clustering 
algorithm (JetClu) [16] with a cone of fixed radius. The JetClu algorithm begins by creating 
a list of calorimeter towers above a fixed Et threshold to be used as seeds for the jet 
finder. This threshold is set to 1.0 GeV. Preclusters are formed from an unbroken chain 
of contiguous seed towers with a continuously decreasing tower Et- If a tower is outside a 
window of seven towers surrounding the seed, it is used to form a new precluster. These 
preclusters are used as a starting point for cone clustering. First, the Et weighted centroid 
of the precluster is found and a cone in rj — <j> space of radius R is formed around the 
centroid. For this analysis, AR = \J (A</>) 2 + (Arj) 2 = 0.4. Then, all towers with an Et 
of, at least, 100 MeV are incorporated into the cluster. A tower is included in a cluster if 
its centroid is inside the cone, otherwise it is excluded. A new cluster center is calculated 
from the set of towers within the clustering cone, again using an Et weighted centroid, 
and a new cone is drawn about this position. The process of recomputing a centroid 
and finding new or deleting old towers is iterated until the tower list remains unchanged. 
Corrections are applied to the measured jet energy to account for detector calibrations, 
multiple interactions, underlying event, and energy outside of the jet cone [17]. 

The data for this analysis are collected using two online event selections (triggers). 
Events in the first 3.0 fb _1 are triggered by selecting those containing at least four jets with 
Et > 15 GeV and total calorimeter Et greater than 175 GeV. Events in the remaining 6.45 
fb _1 are selected by requiring at least three jets with Et > 20 GeV and total calorimeter 
Et greater than 130 GeV. 

3 Event selection 

Events with isolated leptons or missing transverse energy significance 1 greater than 6.0, 
which is indicative of the presence of neutrinos, are removed to ensure an event sample 
independent from other Higgs boson searches at CDF. Events containing four or five jets, 
with Et > 15 GeV and < 2.4 are selected. 

To reduce the QCD multi-jet background, exactly two bottom-quark jets (b jets) are 
required. Two algorithms are used to identify b jets: the SecVtx algorithm [14] and the 
JetProb algorithm [18]. The SecVtx algorithm attempts to reconstruct the secondary 
vertex associated with a bottom-quark (6) decay. The JetProb algorithm searches the 
impact parameter of the charged-particle trajectories (tracks) within a jet and selects those 
that are inconsistent with originating from the decay of a particle occurred in the vicinity 
of the primary event vertex. An additional energy correction is applied to jets identified 
as b jets (section 7). Untagged jets (non b jets) are referred to as q jets in this paper. 

1 Missing transverse energy significance is defined as the ratio of the missing transverse energy to the 
square root of the total transverse energy. The missing transverse energy, $t = \pr\, where |?t is defined 
by, Pt = — 53- E^phi, where i is calorimeter tower number with \rj\ < 3.6, hi is a unit vector perpendicular 
to the beam axis and pointing at the i th calorimeter tower. 
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All selected jets are ordered in Er and the four highest Et jets are considered. The 
scalar sum of the four selected jet Ets (SumEt) is required to exceed 220 GeV and two of 
the four selected jets must be b jets. 

The signal-to-background ratio is enhanced by dividing the data into two independent 
6-tagging categories: SS in which both jets are tagged by SecVtx, and SJ in which one jet 
is tagged by SecVtx and the other by JetProb. If a jet is tagged by both algorithms, it is 
classified as tagged by SecVtx because of the lower misidentification rate. Events in which 
both jets are tagged only by JetProb are not used because the increase in background 
contributions is substantially larger than that for the signal. 

The signal region is defined by requirements on the invariant mass of the two 6-tagged 
jets (mfcfc) and the two untagged jets (m qq ). The VH channel features two intermediate 
resonances, one from the potential Higgs boson decay, in m^, and another from the W/Z 
decay, in m qq . The VBF channel shares the same resonance but the two q jets are not 
produced from the decay of a particle. However these two q jets tend to be produced with 
large 77 separation which gives a large effective m qq mass. The Higgs boson search region 
is defined as 75 < m^b < 175 GeV/ c 2 and m qq > 50GeV/c 2 . 



4 Signal and background samples 

Backgrounds that contribute to the qqbb final state originate from QCD multi-jet pro- 
duction, top-quark pair production, single-top-quark production, W — > q'q plus bb or 
charm-quark pair (cc) production (W+HF), Z — > bb,cc plus jets production (Z+jets), 
and diboson production (WW, WZ, ZZ). About 98% of the total background comes 
from QCD multi-jet production. Signal and non-QCD backgrounds yields are estimated 
from Monte Carlo (MC) simulation. The W+HF and Z+jets contributions are modeled 
with the ALPGEN [19] generator for simulating the bosons plus parton production, and 
PYTHIA [20] for modeling parton showers. The other non-QCD backgrounds and the signal 
are modeled with pythia [20]. All MC-simulated samples use the CTEQ5L [21] parton 
distribution function (PDF) at leading order (LO) and are processed through the full CDF 
II detector simulation [22], based on geant [23], that includes the trigger simulation and 
their trigger efficiencies are corrected as described in ref. [10]. 

The expected signal yield in the SS (SJ) channel is 27.1 ± 4.1 (9.1 + 1.4) for mu = 
125GeV/c 2 . The selected number of data events for SS (SJ) are 87272 (46818). A data- 
driven model is used to predict the shape of QCD multi-jet background but not the overall 
yield (section 6). The number of QCD multi-jet events in each channel is estimated as 
the difference between the number of data events and the predicted number of non-QCD 
events estimated with MC (neglecting the potential Higgs boson contribution). Expected 
and observed event yields are summarized in table 1. In the final fit used to extract a 
potential Higgs boson signal, the overall normalization of the QCD multi-jet background 
is treated as an unconstrained parameter. 
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Total non-QCD 


2192 ± 480 


844 ± 185 


Data 


87272 


46818 


QCD multi-jet 


85080 


45974 


Higgs signal (125GeV/c 2 ) 


27 ±4 


9±1 



Table 1. Expected number of background and signal (mg = 125GeV/c 2 ) events that pass the 
complete event selection for the SS and SJ 6-tag categories. The number of QCD multi-jet events 
is estimated as the difference between data and predicted non-QCD backgrounds (neglecting the 
potential Higgs contribution). The uncertainties of the signal and non-QCD background rate pre- 
dictions include statistical and systematic rate uncertainties, such as cross-section and integrated 
luminosity, as described in section 11. 

5 Search strategy 

The main challenge is to accurately model and reduce the QCD multi-jet background. 
The modeling of this background is obtained from a data-driven technique described in 
section 6. This avoids the need of generating large volumes of QCD multi-jet simulation 
samples, which is computationally intensive and unlikely to accurately reproduce the multi- 
jet spectrum. 

The overwhelming QCD multi-jet background is suppressed by relying on multi-variate 
techniques that combine information from multiple variables to identify potential Higgs 
boson events. For example, the best signal-to-background ratio using just rribb is 0.0007 2 . 
In this search, the use of multi-variate techniques improves the signal-to-background to 
0.006 3 , a factor of 10 improvement. A total of eleven artificial neural networks (NN) [24, 
25] are used to improve the resolutions of variables sensitive to Higgs production and to 
separate the signal and background contributions. Altogether, the use of these NN leads 
to a 24% increase in search sensitivity 4 , in addition to that expected from the inclusion of 
additional data with respect to the previous analysis [11]. 

This analysis focuses on Higgs boson decays to bb, and thus it is important to have the 
best possible resolution for m^b- Section 7 describes a NN used to correct the energies of 

2 only considering events under the Higgs signal peak (120 < rribb < 140GeV/c 2 ), Fig. 3(a) 
3 only considering events with Higgs-NN > 0.95, Fig 8 

4 The search sensitivity is defined as the percentage reduction of the median expected limit. 
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b jets, which in turn improves m&&. The untagged jets (q jets) associated with each Higgs 
production process have unique angular and kinematic distributions. Section 8 describes 
three networks that exploit these variables to identify q jets from Higgs boson events. As 
gluon jets are typically wider than quark jets, jet width is useful for separating events 
containing quark jets associated with Higgs-boson production from generic jets contained 
within QCD multi-jet events, which are a mixture of quark and gluon jets. Section 9 
describes a technique for measuring jet width and a NN used to remove detector and 
kinematic dependences that also influence the jet width. 

Section 10 describes the final two-stage NN that is used to extract a potential signal 
contribution from the backgrounds. The two-stage NN can identify Higgs bosons produced 
by three different processes simultaneously. The first stage is based on three separate NNs 
trained specifically to separate backgrounds from either WH, ZH, or VBF Higgs production, 
respectively, to exploit the unique characteristics of each signal process. The outputs of 
the three process-specific NNs are used as inputs to a second NN. The inputs to the first- 
stage networks are the corrected 6-jet energies, corrected q-jet widths, outputs of the g-jet 
networks, and other kinematic event variables. In the previous search [11], exclusive VH 
and VBF networks were used to search for Higgs bosons in non-overlapping signal regions. 
The two-stage NN, developed for this search, increases the search sensitivity by 15%. The 
use of a single signal region increases the number of potential Higgs boson signal events by 
20%. Both gains are above those expected from the inclusion of additional data alone. 

6 QCD multi-jet background prediction 

Kinematic features of the QCD multi-jet background are predicted using a data-driven 
method. An independent data control region is used to measure the probability for an 
event with one 6-tagged jet to contain an additional 6-tagged jet (probe jet), referred to 
as the Tag Rate Function (TRF). The TRF is applied to data samples with exactly one 
jet 6-tagged by SecVtx to predict the distribution of events with two 6-tagged jets. The 
TRF is parameterized as a function of three variables: Er of the probe jet, r] of the 
probe jet, and AR between the tagged b jet and probe jet, and implemented as a three- 
dimensional histogram. The choice of variables used to parameterize the TRF is motivated 
by the kinematics of the QCD multi-jet background and the characteristics of the 6-tagging 
algorithms. For example, the production of b jets from gluon splitting has a different AR 
distribution compared to direct production, and the probe jet Et and r\ expresses aspects 
of the ^-tagging algorithms and QCD multi-jet production. Further information on the 
technique can be found in [10]. 

We use separate TRFs for SS and SJ events, which are obtained from events in the TAG 
region (figure 1), defined as m qq G [40 GeV/c 2 , 45 GeV/c 2 ] U m bb G [65 GeV/c 2 , 250 GeV/c 2 ] 
and m qq > 45 GeV / c 2 U m bb G [65 GeV/c 2 , 70 GeV/c 2 ] U m bb G [200 GeV/c 2 , 250 GeV/c 2 ]. 
To validate the background model, the TRF is tested in the TAG (for self-consistency) and 
two other control regions non-overlapping with the signal region (figure 1): the CONTROL 
region, defined as m qq G [45 GeV/c 2 , 50 GeV/c 2 ] U m bb G [70 GeV/ c 2 , 200 GeV/ c 2 ] and 
m qq > 50GeV/c 2 Um 66 G [70 GeV/c 2 , 75 GeV/c 2 ] U m bb G [175 GeV/c 2 , 200 GeV/c 2 ]; and 
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the NJET6 control region defined as sharing the same m&& and m qq criteria as the signal 
region, but contains those events with six reconstructed jets. The TRF prediction of 
different variables are compared to data in these control regions and any shape difference 
is propagated as an uncertainty of the QCD multi-jet model. 

The m qq variable is not perfectly modeled by the TRF. The residual mismodeling is 
corrected by following the procedure defined in previous searches [10, 11], which reweights 
events as a function of the observed m qq . The correction function is derived from a fit to 
the ratio of the observed m qq over the same quantity predicted by TRF in events from the 
TAG region. 

Figures 3-7 show a comparison of observed data and background predictions in the 
signal region for the variables used in the final signal discrimination neural network (sec- 
tion 10) after application of the m qq correction function. The modeling of some variables 
appears to be poor but the differences are within the shape uncertainties of the QCD 
multi-jet prediction. 
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Figure 1. Signal and controls regions in the mbf,-m gg plane. The TAG region is used to derive 
the TRF for modeling the QCD multi-jet background. The CONTROL region is used to test and 
derive systematic uncertainties of this background model. 



7 Energy correction for b jets 

The experimental resolution of the invariant mass of the two b jets, mbb, has a significant 
effect on the sensitivity of our search. To improve the my, resolution, a NN is trained to 
estimate the correction factor required to obtain the best possible estimate of the parent 
6-parton energy from the measured jet energy [26]. 

A NN is trained for each 6-tagging algorithm. Nine variables are used to train the 
NN for SecVtx tagged jets: the jet Et, the jet transverse momentum (px = psin#), the 
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Et before the application of jet energy correction (uncorrected jet Et), the transverse 
mass 5 , the decay length of the jet in the transverse plane 6 and its uncertainty, the px of 
the secondary vertex, the maximum pt of the tracks inside the jet cone, and the pr sum 
of all tracks within the jet cone. Six variables are used to train the NN for JetProb tagged 
jets: the jet Et, the jet pr, the uncorrected jet Et, the transverse mass, the maximum pt 
of the tracks inside the jet cone, and the pt sum of all tracks within the jet cone. 

The NNs are trained using simulated VBF events 7 with Higgs masses from 100 GeV/c 2 
to 150GeV/c 2 at 5GeV/c 2 intervals. Events are required to pass the selection described 
in section 3 and each 6-tagged jet is required to be matched geometrically with a b parton. 
The matching criterion requires the AR between the b jet and b parton not to exceed 
0.4. SecVtx- and JetProb-tagged jets are used to train the SecVtx and JetProb networks, 
respectively. 

Figure 2 shows the distribution in simulated decays of 125GeV/c 2 Higgs bosons 
produced through VBF, before and after 6-jet energies are corrected. The mean shifts 
from 116GeV/c 2 to 128GeV/c 2 and the root mean square (RMS) from 15.6GeV/c 2 to 
13.7GeV/c 2 . The resolution, defined as the ratio between the RMS and the mean, shifts 
from 0.13 to 0.11, an improvement of 18%. 

The 6-jet energy corrections should be independent of the sample used to train and 
test the NN. The NN training and testing was repeated using WH and ZH events and 
similar results were obtained. 

8 Untagged jets neural network 

The angular distributions of untagged jets (q jets) from VH or VBF differs from the 
angular distributions of generic jets contained within QCD multi-jet background events. 
Identification of q jets can therefore help to separate signal events from QCD multi-jet 
background contributions. In particular, the m qq obtained from q jets associated with the 
WH and ZH processes is constrained by the mass of the W and Z, respectively. The q jets 
produced in VBF events are typically separated by large 4> and 77, while the q jets in QCD 
multi-jet events tend to exhibit a large difference in (j) and a small difference in i]. Three 
networks [24], referred to as (/(JlVF^NN, qq_ZHNN, and ggVBF NN, are trained to separate 
events with q jets originating from WH, ZH, and VBF production from background events. 
The input variables are m qq , A<f> qq , Ar] qq , AR qq , and the transverse momenta of each q jet 
with respect to the total momentum of the system. The networks are trained using Higgs 
MC to model signal and data-driven prediction for QCD multi-jets to model background. 

9 Jet width 

The untagged jets (q jets) associated with the QCD multi-jet background are a mixture of 
quark and gluon jets whereas the q jets associated with the Higgs signal are predominantly 

5 The transverse mass is defined as (pt/p)M, where M is the invariant mass of the jet. 
6 The decay length is defined as the transverse distance between the primary vertex and the reconstructed 
secondary vertex in the SecVtx fa-tagged jet. 

7 A11 NNs in this paper are trained using statistically independent samples. 
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Figure 2. Comparison of rribb distribution in simulated decays of 125GeV/c 2 Higgs bosons pro- 
duced through VBF, before and after the &-jet energy correction for a VBF MC sample with 
inn = 125GeV/c 2 (indicated by the black arrow). 



quark jets. As gluon jets tend to be broader than quark jets, jet width is another useful 
variable for distinguishing potential Higgs events from the background. We defined jet 
widths measured within the calorimeter ((-R)cal) and tracker ((-R)trk) as 
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where Ai?(tower,jet) (Ai?(track,jet)) is the angular distance between the jet axis and the 
calorimeter tower (track). All calorimeter towers within the jet cone of AR < 0.4 are 
used in the (R)cAh calculation. All tracks with pt > lGeV/c and within the jet cone of 
AR < 0.4 are used in the calculation of (-R)trk- 

The jet width also varies as a function of jet Et, jet rj, and the number of primary 
vertices (N vtx ), and is parameterized by a neural network fit. These dependences are 
removed by rescaling the measured jet widths to a common reference (that for a jet with 
Et=50 GeV, r/=0, and iVvt x =l) using the procedure described in ref. [11]. The NN function 
to parameterize the variation of jet width with jet Et, jet rj, and iVytx, is trained on a sample 
of untagged quark jets from the hadronic W boson decays in tt — > bbluqq (£ = e, fx) events. 
The highest Et untagged-jet pair whose invariant mass is in the range 50 — 110 GeV/c 2 is 
assumed to originate from the hadronic W boson decay. Separate networks are trained for 
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MC and data. After rescaling, any differences in the jet width are assumed to be associated 
with the type of parton that initiated the jet. The ti MC and data g-jet width distributions 
are found to agree after rescaling the measured jet widths. To check that the jet width 
rescaling can be applied to non-ii samples, the rescaling is also applied to the q jets in WH, 
ZH, and VBF MC events. The mean rescaled jet width in all samples is consistent with the 
width observed in the ti sample, which verifies the independence of the corrections with 
respect to jet Et, r), and iVytx- 

A systematic uncertainty is assigned by adding an offset to the rescaled ti MC jet 
width and comparing the x 2 /degree of freedom (% 2 /d.o.f) of the shifted MC and ti data 
distributions with the unshifted MC and data. The uncertainty is defined by the offset 
that changes the x 2 /d.o.f by ±1 unit. The calorimeter jet width uncertainty is ±2.6% and 
the tracker jet width uncertainty is ±5.5%. 

Figures 4(c)-4(f) show the corrected jet width distributions of the untagged jets mea- 
sured by the calorimeter and tracker. The Higgs signal tends to lower jet width values, 
which implies quark-like, whereas the QCD multi-jet tends to higher jet width, which im- 
plies a mixture of quark and gluons. The jet width distributions of the Higgs signal is 
different to the background which shows this variable is useful for the Higgs boson search. 

10 Classification of Higgs boson events 

A final NN is trained to optimize the separation of signal and background [24], which 
incorporates information from kinematic and angular jet variables, jet widths, event shapes, 
and the outputs of the untagged jets (q jets) NNs. The energies of the b jets and widths 
of the q jets are corrected as described in sections 7 and 9, respectively. As the WH, ZH, 
and VBF processes have different kinematics, dedicated WH, ZH, and VBF networks are 
trained separately for each process, resulting in three outputs. The outputs of the process- 
specific NNs are combined as inputs to a grand NN, referred to as the Higgs-NN. The 
output of the Higgs-NN is used to obtain Higgs search limits. 

The selection of input variables for the process specific WH, ZH, and VBF networks 
training must fulfill two criteria: the variables must have good background-to-signal sep- 
aration, and they must be well modeled by TRF. The discriminating variables for the 
Wff-NN and ZH-N~N training are m&&, m qq , the cosine of the leading-jet scattering angle in 
the four-jet rest-frame (cos (#3)) [27], the x variable 8 [11], the calorimeter jet width of the 
first ({R)cal) an d second leading untagged jet ((R)^ AL ), the tracker jet width of the first 
((R)trk) an d second leading untagged jet ((R)trk)-> aplanarity, sphericity, centrality [20], 
AR of the two 6-tagged jets, AR of the two untagged jets, Acft of the two 6-tagged jets, A0 
of the two untagged jets, and the qqJWH and qq_ZH network outputs (section 8). Not all 
variables used in the WH and ZH networks' training have a good background-to-signal sep- 
aration for VBF. For the VBF-NN training, the cos(#3), the aplanarity, and the A<j) of the 
two untagged jets are removed; the rj angle of the first (j] qi ) and second leading untagged 
jet (%J, the At/ of the two untagged jets (Ar] qq ), the invariant mass of four jets system, 

8 \ variable is the minimum of xw and \z where xw = \J (Mw — m gq) 2 + (Ms — mbb) 2 and a similar 
expression exists for \z- 
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the sum of the four jets' momenta along z direction are added, and the qqJVH and qq_ZH 
network outputs are replaced by g^-VBF NN output. Overall, the V7?(VBF)-NN is trained 
with 17(18) variables, of which m&& and m qq (m qq and Ar] qq ) are the most discriminating 
variables. Figures 3-7 show distributions of the Higgs signal and background. It clearly 
shows that the background is dominated by the QCD multi-jet background. Each vari- 
able demonstrates some ability to distinguish a Higgs boson from the background. Some 
variables, such as figures 3(c) and 7(a) appear to have some mismodeling of the back- 
ground. However the observed difference are within the shape uncertainties of the TRF 
QCD multi-jet prediction. These shape uncertainties are derived by testing these variables 
in the TAG, CONTROL, and NJET6 control regions and propagating any difference as a 
shape uncertainty. 
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Figure 3. The QCD multi-jet background prediction (SS 6-tag category) for (a) mbb, (b) m qq , 
(c) the invariant mass of four-jets system, and (d) the sum of the momenta along z direction for 
each of the four jets in the search signal region. The m qq variable distribution is obtained after 
the application of the m qq correction described in section 6. The black histograms are the TRF 
derived predictions for the QCD multi-jet background, and the black triangles are the data. The 
yellow histogram shows the MC predicted non-QCD background which is the sum of ti, single-top, 
Z+jets, W + HF, and diboson contributions. The predicted distributions for WH events (red), ZH 
events (blue), and VBF events (green) for a Higgs mass of mjj = 125 GcV/c 2 scaled by a factor of 
1000 are also shown. 
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Figure 4. The QCD multi-jet background predictions for the SS fo-tag category of (a) the cosine 
of the leading-jet scattering angle in the four-jet rest- frame [27], (b) the x variable [11], (c) the 
calorimeter jet width of the first and (d) second leading untagged jet, and (e) the tracker jet 
width of the first and (f) second leading untagged jet. Descriptions of the signal and background 
histograms can be found in figure 3. 
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Figure 5. The QCD multi-jet background prediction for the SS 6-tag category of (a) the 77 angle 
of the first leading untagged jet and (b) second leading untagged jet, (c) Ar] of the two untagged 
jets, (d) the aplanarity [20], (e) the sphericity [20], and (f) centrality [20]. Descriptions of the signal 
and background histograms can be found in figure 3. 
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Figure 6. The QCD multi-jet background prediction for the SS 6-tag category of (a) the AR of 
the two 6-tagged jets and (b) of the two untagged jets, (c) the Acfi of the two 6-tagged jets and (d) 
of the two untagged jets. Descriptions of the signal and background histograms can be found in 
figure 3. 
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Figure 7. The QCD multi-jet background prediction for the SS 6-tag category of the (a) 
qq.WH NN, (b) qq.ZH NN, and (c) qq_\B¥ NN (section 8). Descriptions of the signal and 
background histograms can be found in figure 3. 
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The Wif-NN, Zff-NN, and VBF-NN are trained using dedicated MC samples for signal 
modeling. A small subset (10%) of single-tagged jet events, after random selection and 
application of the TRF, is used as the QCD multi-jet training sample. The remaining 90% 
of events are reserved for modeling the NN output distributions. As the shapes of the 
kinematic distributions are found to be consistent for both 6-tagging categories, the NN is 
trained using SS events. 

The search focuses on Higgs boson mass hypotheses in the range 100 < mn < 
150GeV/c 2 at 5GeV/c 2 intervals. The sensitivity of the search is improved by using 
separate trainings at three specific Higgs boson masses: 100GeV/c 2 , 120GeV/c 2 , and 
140GeV/c 2 . For each Higgs boson mass hypothesis, we choose the training that gives the 
best search sensitivity. 

Only variables that are well modeled by the TRF are used to train the VFff-NN, ZH- 
NN, and VBF-NN. As a further validation, the modeled outputs of the WH, ZH, and VBF 
networks are compared to TAG events in data. The WH, ZH networks are found to be 
well modeled, but the VBF-NN requires an additional correction, analogous to the re- 
weighting performed to correct m qq (section 6). Figure 8 shows the Higgs-NN distribution 
of 125GeV/c 2 Higgs boson events with both b jets tagged by SecVtx, after the VBF-NN 
correction function was applied. The histogram shows the data, a stacked distribution of the 
backgrounds, and the Higgs boson signal scaled by 1000. As the QCD multi-jet background 
is large, it is difficult to see the non-QCD contributions and the QCD uncertainty. In the 
lower QCD subtracted data plot, it is easier to see how well the background is modeled. 
This plot shows the QCD uncertainty is as large as the total non-QCD contributions 
and the QCD subtracted data is consistent with the non-QCD background and the QCD 
uncertainty. 

11 Systematic uncertainties 

This search considered systematic effects that affect the normalization (rate systematic 
uncertainty) and the output (shape systematic uncertainty) of the Higgs-NN for the sig- 
nal and background. The rate systematic uncertainties are defined as the variations of 
the number of events that pass the selection requirements. The shape-related systematic 
uncertainties are expressed as fractional changes in the binned distributions. 

The systematic effects that affect the normalization of the Higgs boson and non-QCD 
background are the uncertainty on the jet energy scale (JES) [17], on the PDF, 6-tagging 
scale factor, initial and final state radiation (ISR and FSR), trigger efficiency, integrated 
luminosity, and cross section [5]. The effects that affect the shape of the Higgs boson and 
non-QCD backgrounds are the uncertainties on the JES, ISR, FSR, and the jet width. 
The shape uncertainties are evaluated by adjusting their values by ±lcr, and propagating 
this change through the event selection and Higgs-NN. Table 2 summarizes all systematic 
uncertainties. 

Only shape uncertainties are considered for the QCD multi-jet component, the nor- 
malization is unconstrained. The TRF QCD shape uncertainties arise from uncertainties 
in the interpolation, m qq and VBF-NN correction functions. The TRF shape uncertainty 
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Figure 8. Higgs-NN distribution of 125GeV/c 2 Higgs boson events with both b jets tagged by 
SecVtx, after the VBF-NN correction function was applied. All backgrounds are stacked and the 
superimposed Higgs boson signal is scaled by 1000. As the QCD multi-jet background is large, 
plots of the difference of data and QCD multi-jet arc plotted with a stacked plot of non-QCD 
background and QCD multi-jet systematic uncertainty. Both plots show the data are consistent 
with the background, especially at large Higgs-NN score where the higgs signal peaks. 



is defined as the shape difference of the nominal QCD shape and a systematically shifted 
version. 

The interpolation uncertainty accounts for sample-dependence of the TRF. A TRF 
is measured in the TAG region to its application in the signal region. Another TRF is 
measured in the CONTROL region (figure 1) and is applied to the signal region. The shape 
difference of the nominal TAG TRF and the CONTROL TRF defines the interpolation 
uncertainty. 

The Triqq and VBF-NN distributions require an additional correction to improve their 
TRF modeling (sections 6 and 10). The nominal correction functions are measured in 
the TAG region and an alternative is measured in the CONTROL (m qq ) and NJET6 
(VBF-NN) regions. The shape difference between the usage of the nominal and alternative 
correction function defines the correction function shape uncertainty. 
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TRF (QCD multi-jet) uncertainties 


TRF interpolation 


Shape 


TRF m q q correction 


Shape 


TRF VBF-NN correction 


Shape 


Signal and Background uncertainties 


Luminosity 


± 6% Rate 


Trigger 


± 3.55% Rate 


SecVtx+SecVtx 


± 7.1% Rate 


SecVtx+JetProb 


±6.4% Rate 


Jet Energy Correction 


± 9% Rate 




Shape 


Jet width 


Shape 


Cross section uncertainties 


tt and single-top 


± 7% Rate 


Diboson (WW/WZ/ZZ) 


± 6% Rate 


W+HF and Z+jets 


± 50% Rate 


WH/ZH 


± 5% Rate 


VBF 


± 10% Rate 


Signal uncertainties 


PDF 


± 2% Rate 


ISR/FSR 


± 3% Rate 




Shape 



Table 2. Summary of all systematic uncertainties. 



12 Results 

The Higgs-NN output distribution in data is compared to the background predictions. No 
evidence of a Higgs boson signal is found, nor any disagreement between the predicted 
background and observed data. Upper exclusion limits are calculated on the Higgs boson 
cross-section at the 95% CL. The limits are calculated using a Bayesian method with a 
non-negative flat prior for the signal cross section. We integrate over Gaussian priors for 
the systematic uncertainties, truncated to ensure that no prediction is negative, and incor- 
porate correlated rate and shape uncertainties as well as uncorrelated bin-by-bin statistical 
uncertainties [28]. Figure 9 and table 3 show the limits from the combination of SS and SJ 
6-tagging categories. The observed limits agree with the expected limits. 

13 Summary 

A search for the Higgs boson is performed in the all-hadronic final state using 9.45 fb _1 of 
data collected by the CDF II detector. The results discussed in this paper have halved the 
expected limit of the previous search [11]. Half of the improvement comes from additional 
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Table 3. Expected and observed 95% CL upper limits for the combined SS and SJ channels. The 
limits are relative to the expected Higgs cross section. 
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Figure 9. Upper limits at 95% CL for combined SS and SJ channels: the expected and observed 
limits are plotted as a function of the Higgs boson mass. The limits are relative to the expected 
SM Higgs boson production, which includes the H — > bb branching ratio. 



data and the expanded signal region contributes an additional 17%. The reduction of the 
6-jet energy resolution by 18%, adding a new jet width measurement, improving the QCD 
multi-jet modeling, and adding more variables in the Higgs neural network and improving 
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its training contributes another 24%. The combination of multi-variate techniques im- 
proved the best signal-to-background ratio from 0.0007, if the m^b distribution alone was 
used for the search, to 0.006, which is almost a ten-fold increase. No significant Higgs boson 
signal is observed and upper exclusion limits are set on the observed Higgs cross section 
relative to the SM rate as a function of Higgs boson mass in the range 100-150 GeV/c 2 . For 
a 125GeV/c 2 Higgs boson, the 95% CL expected (observed) limit is 11.0 (9.0) times the 
expected SM rate. This search is CDF's fourth most sensitive H — > bb search and is more 
sensitive than CDF's ttH [29] and similar to CDF's H — >■ 77 [30] searches, which have an 
expected limit of 12.6 and 9.9 for a 125GeV/c 2 Higgs boson, respectively. CDF has also 
developed an improved algorithm to identify (tag) 6-jets [31], which improves the 6-tagging 
rate from 39% to 54% and was used in the latest ZH -> llbb [32] and WH -> tvbb [33] 
searches. The addition of new 6-jet tagger could potentially improve this search's expected 
limit by an additional 40% which would lower the expected limit to 7.9 times the expected 
SM rate for a 125GeV/c 2 Higgs boson. The all-hadronic search is a unique channel at the 
Tevatron that has not been explored at the LHC. The improvements described in this pa- 
per, such as the data-driven QCD multi-jet prediction, b-jet energy corrections, jet width, 
and two-stage NN can be applied to H — >• bb searches and other multi-jet analyses at the 
LHC. 
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